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Abstract 

We consider the problem of learning the structure of Ising models (pairwise binary Markov 
random fields) from i.i.d. samples. While several methods have been proposed to accomplish this 
task, their relative merits and limitations remain somewhat obscure. By analyzing a number of 
concrete examples, we show that low-complexity algorithms often fail when the Markov random 
field develops long-range correlations. More precisely, this phenomenon appears to be related to 



c/3 ■ the Ising model phase transition (although it does not coincide with it). 



> ■ 1 Introduction and main results 

: 

Given a graph G = iV = [p],i?), and a positive parameter ^ > the ferromagnetic Ising model on 
G is the pairwise Markov random field 



G, 



over binary variables x = {xi,X2, ■ ■ ■ ,Xp), G {+1, — !}• Apart from being one of the best studied 
^ . models in statistical mechanics [U [2] , the Ising model is a prototypical undirected graphical model. 
^ I Since the seminal work of Hopfield [3] and Hinton and Sejnowski [1], it has found application in 
numerous areas of machine learning, computer vision, clustering and spatial statistics. 

The obvious generalization of the distribution ([1]) to edge-dependent parameters Oij, {i,j) £ E is 
of central interest in such applications, and will be introduced in Section [2.2.21 Let us stress that we 
follow the statistical mechanics convention of calling ([1]) an Ising model even if the graph G is not a 
grid. 

In this paper we study the following structural learning problem: 

Given n i.i.d. samples x*-^^, x^'^\. . ■ , x^"^ G —1}^ with distribution HG,oi ■ ); recon- 
struct the graph G. 

For the sake of simplicity, we assume in most of the paper that the parameter 9 is known, and that 
G has no double edges (it is a 'simple' graph). We focus therefore on the key challenge of learning 
the graph structure associated to the measure fiQ^g {•). This structure is particularly important 
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for extracting the qualitative features of the model, since it encodes its conditional independence 
properties. 

It follows from the general theory of exponential families that, for any 6 £ (0, oo), the model ([1]) 
is identifiable [5]. In particular, the structural learning problem is solvable with unbounded sample 
complexity and computational resources. The question we address is: for which classes of graphs 
and values of the parameter 9 is the problem solvable under realistic complexity constraints? More 
precisely, given an algorithm Alg, a graph G, a value 6 of the model parameter, and a small 6 > 0, 
the sample complexity is defined as 

nMs{G,e) = mf{neN : P„,G,e{Alg(x«, • • • , x^")) = G} > 1 - , (2) 

where Pn,G,e denotes probability with respect to n i.i.d. samples with distribution fj.G,e- Further, we 
let XA\g{G, 9) denote the number of operations of the algorithm Alg, when run on nf>^\g{G, 9) samples. 
The general problem is therefore to characterize the functions nAig(G, 9) and XA\giG, 9), and to design 
algorithms that minimize the complexity. 

Let us emphasize that these are not the only possible definitions of sample and computa- 
tional complexity. Alternative definitions are obtained by requiring that the reconstructed structure 
Mg{x^^\ . . . ,x^"'^) is only partially correct. However, for the algorithms considered in this paper, 
such definitions should not result in qualitatively different behavioi0 

General upper and lower bounds on the sample complexity nMg{G, 9) were proved by Santhanam 
and Wainwright [6l [7] , without however taking into account computational complexity. On the other 
end of the spectrum, several low complexity algorithms have been developed in the last few years (see 
Section [1.31 for a brief overview). However the resulting sample complexity bounds only hold under 
specific assumptions on the underlying model (i.e. on the pair {G,9)). A general understanding of 
the trade-offs between sample complexity and computational complexity is largely lacking. 

This paper is devoted to the study of the tradeoff between sample complexity and computational 
complexity for some specific structural learning algorithms, when applied to the Ising model. An 
important challenge consists in the fact that the model ([T]) induces subtle correlations between the 
binary variables {xi, . . . , Xp). The objective of a structural learning algorithm is to disentangle pairs 
Xi,Xj that are conditionally independent given the other variables (and hence are not connected by 
an edge) from those that are instead conditionally dependent (and hence connected by an edge in 
G). This becomes particularly difficult when 9 becomes large and hence pairs Xi, Xj that are not 
connected by an edge in G become strongly dependent. The next section sets the stage for our work 
by discussing a simple and concrete illustration of this phenomenon. 

1.1 A toy example 

As a toy illustratiorH of the challenges of structural learning, we will study the two families of graphs 
in Figure [H The two families will be denoted by {Gp}p>3 and {G'p}p>3 and are indexed by the 
number of vertices p. 

Graph Gp has p vertices and 2{p—2) edges. Two of the vertices (vertex 1 and vertex 2) have degree 
(p — 2), and (p — 2) have degree 2. Graph G'p has also p vertices, but only one edge between vertices 
1 and 2. In other words, graph G'p corresponds to variables xi and X2 interacting 'directly' (and 
hence not conditionally independent), while graph Gp describes a situation in which the two variables 



Indeed the algorithms considered in this paper reconstruct G by separately estimating the neighborhood of each 
node i. This implies that any significant probability of error results in a substantially different graph. 

similar example was considered in [S]. 
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Figure 1: Two families of graphs Gp and G'p whose distributions ^JLGp.e a-nd A*g^,6I' merge as p gets 
large. 



interact 'indirectly' through numerous weak intermediaries (but still are conditionally independent 
since they are not connected). Fix p, and assume that one of Gp or G'p is chosen randomly and i.i.d. 
samples . . ,x^"'^ from the corresponding Ising distribution are given to us. 

Can we efficiently distinguish the two graphs, i.e. infer whther the samples were generated using 
Gp or Gp? As mentioned above, since the model is identifiable, this task can be achieved with 
unbounded sample and computational complexity. Further, since model ([T]) is an exponential family, 
the p X p matrix of empirical covariances (1/?^) X]"=i ^''^''fe^^^)^ provides a sufficient statistic for 
inferring the graph structure. 

In this specific example, we assume that different edge strengths are used in the two graphs: 9 for 
graph Gp and 6' for graph G'p (i.e. we have to distinguish between fJ.Gp,e and fJ'G'p,e')- We claim that, 
by properly choosing the parameters 9 and 6' , we can ensure that the covariances approximately 
match \KQ^^0{xiXj} — KQ/^^0i{xiXj}\ = 0{l/y/p). Indeed the same remains true for all marginals 
involving a bounded number of variables. Namely, for all subsets of vertices ?7 C [p] of bounded size 
It^Gpfiisiu) ~ f^G'p,e'ixjj)\ = 0(l/y/p). Low-complexity algorithms typically estimate each edge using 
only a small subset low-dimensional marginal. Hence, they are bound to fail unless the number of 
samples n diverges with the graph size p. On the other hand, a naive information-theoretic lower 
bound (in the spirit of [61 ^T]) only yields nMg{G,6) = $7(1). This sample complexity is achievable by 
using global statistics to distinguish the two graphs. 

In other words, even for this simple example, a dichotomy emerges: either a number of samples 
has to grow with the number of parameters, or algorithms have to exploit a large number of marginals 
of fJ-G,e- 

To confirm our claim, we need to compute the covariance of the Ising measures distributions 
fJ'Gp,e, t^G'pfi'- We easily obtain, for the latter graph 



^G'p,e'{xiX2] = tanh6'', (3) 

¥.G'^.e>{xiXj} = 0. (i,j)/(l,2). (4) 

The calculation is somewhat more intricate for graph Gp, so we defer complete formulae to Appendix 
Eland report here only the result for p^ 1, 1: 

^Gpfi{xiX2} = timh{pe^ + o{pe\e)] , (5) 

^Gpfii^iXj} = 0(6, pe^), iG{l,2},i E{3,...,p}, (6) 

EcpAxiXj} = 0{9\pe^), i,j e{3,...,p}. (7) 



In other words, variables xi and X2 are strongly correlated (although not connected), while all the 



other variables are weakly correlated. By letting 6 = \J 6' /p this covariance structure matches 
Eqs. dSI), dH) up to corrections of order 

Notice that the ambiguity between the two models Gp and G'^ arises because several weak, indirect 
paths between xi and X2 in graph Gp, add up to the same effect as a strong direct connection. This 
toy example is hence suggestive of the general phenomenon that strong long-range correlations can 
'fake' a direct connection. However, the example is not completely convincing for several reasons: 

1 . Most algorithms of interest estimate each edge on the basis of a large number of low-dimensional 
marginals (for instance all pairwise correlations). 

2. Reconstruction guarantees have been proved for graphs with bounded degree [9l [lOl El El [TT] . 
while here we are letting the maximum degree be as large as the system size. Notice however 
that a the graph considered here are only sparse 'on average'. 

3. It may appear that the difficulty in distinguishing graph Gp from G'p is related to the fact that 
in the former we take 9 = 0{l/yjp). This is however the natural scaling when the degree of 
a vertex is large, in order to obtain a non-trivial distribution. If the graph Gp had 6 bounded 
away from 0, this would result in a distribution iiGp,e{-L} concentrated on the two antipodal 
configurations: all-(+l) and all-(— 1). Structural learning would be equally difficult in this 
case. 

Despite these points, this model provides already a useful counter-example. In Appendix lD.3l we will 
show why, even for bounded p (and hence 9 bounded away from 0) the model Gp in Figure [Tj 'fools' 
regularized logistic regression algorithm of Ravikumar, Wainwright and Lafferty [TT]. Regularized 
logistic regression reconstructs G'^ instead of Gp. 

1.2 Outline of the paper 

The rest of this paper is devoted to bounding the sample complexity n^\g and computational com- 
plexity XAig for a number of graph models, as a function of 9. Results of this analysis are presented 
in Section [2] for three algorithms: a simple thresholding algorithm, the conditional independence test 
method of [lOj and the penalized pseudo-likelihood method of [Tl]. In Section [3l we validate our 
analysis through numerical simulations. Finally, Section [D contains the proofs with some technical 
details deferred to the appendices. 

This analysis unveils a general pattern: when the model ([Ip develops strong correlations, several 
low- complexity algorithms fail, or require a large number of samples. What does 'strong correlations' 
mean? As the toy example in the previous section demonstrates, correlations arise from a trade-off 
between the degree (which we will characterize here via the maximum degree A), and the interaction 
strength 9. It can be ascribed to a few strong connections (large 9) or to a large number of weak 
connections (large A). Is there any meaningful way to compare and combine these quantities {9 and 
A)? An answer is suggested by the theory of Gibbs measures which predicts a dramatic change of 
behavior when 9 crosses the so-called 'uniqueness threshold' 6uniq(A) = atanh(l/(A — 1)) [12]. For 
9 < 0uniq(A) Gibbs sampling mixes rapidly and far apart variables in G are roughly independent 
[13) . Vice versa, for any 9 > 9^niq{^) there exist graph families on which Gibbs sampling is slow, 
and far apart variables are strongly dependent [14J. While polynomial sampling algorithms exists for 
all ^ > [15], for < 0, in the regime \9\ > 0uniq(A) sampling is arguably #-P hard [16]. Related 
to the uniqueness threshold is also the phase transition threshold, which is graph dependent, with 
typically 6'crit < const. /A. 

We will see that this is indeed a relevant way of comparing interaction strength and degree, 
even for structural learning. Al the algorithms we analyzed (mentioned above) provably fail for 
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9 ^ const. /A, for a number of 'natural' graph families. Our work raises several fascinating questions, 
the most important being the construction of structural learning algorithm with provable performance 
guarantees in the strongly dependent regime ^crit ^ const. /A. The question as to whether such an 
algorithm exists is left open by the present paper (but see next section for an overview of earlier 
work) . 

Let us finally emphasize that we do not think that any of the specific families of graphs studied 
in the present paper is intrinsically 'hard' to learn. For instance, we show below that the regularized 
logistic regression method of fails on random regular graphs, while it is easy to learn such graphs 
using the simple thresholding algorithm of Section 12.11 The specific families where indeed chosen 
mostly because they are analytically tractable. 

1.3 Further related work 

Traditional algorithms for learning Ising models were developed in the context of Boltzmann machines 
[H [T71 I18j . These algorithms try to solve the maximum likelihood problem by gradient ascent. 
Estimating the gradient of the log-likelihood function requires to compute expectations with respect 
to the Ising distribution. In these works, this was done using the Markov Chain Monte Carlo 
(MCMC) method, and more specifically Gibbs sampling. 

We shall not consider this approach in our study for two type of reasons. First of all, it does 
not output a 'structure' (i.e. a sparse subset of the (2) potential edges): because of approximation 
errors, it yields non-zero values for all the edges. This problem can in principle be overcome by using 
suitably regularized objective functions, but such a modified algorithm was never studied. 

Second, the need to compute expectation values with respect to the Ising distribution, and the 
use of MCMC to achieve this goal, poses some fundamental limitations. As mentioned above, the 
Markov chain commonly used by these methods is simple Gibbs sampling. This is known to have 
mixing time that grows GxpoiiGntia-lly in tliG number of variables for ^ ^uniq 

(A), and hence does not 

yield good estimates of the expectation values in practice. While polynomial sampling schemes exist 
for models with 9 > [15], they do not apply to < or to general models with edge-dependent 
parameters 9ij. Already in the case 9 < 0, estimating expectation values of the Ising distribution is 
likely to be #-P hard jl6j . 




Abbeel, Koller and Ng [9] first developed a method with computational complexity provably poly- 
nomial in the number of variables, for bounded maximum degree, and logarithmic sample complexity. 
Their approach is based on ingenious use of the Hammersley-Clifford representation of Markov Ran- 
dom Fields. Unfortunately, the computational complexity of this approach is of order p^^"^ which 
becomes unpractical for reasonable values of the degree and network size (and super polynomial for 
A diverging with p). The algorithm by Bresler, Mossel and Sly [10] studied in Section [2.2. II presents 
similar limitations, that the authors overcome (in the small 9 regime) by exploiting the correlation 
decay phenomenon. 

An alternative point of view consists in using standard regression methods. In the context of 
Ising models, Ravikumar, Wainwright and Lafferty [TT] showed that the neighborhood of a vertex 
i can be efficiently reconstructed by solving an appropriate regularized regression problem. More 
precisely, the values of variable Xi are regressed against the value of all the other variables. The 
logistic regression log- likelihood is regularized by adding an £i -penalty that promotes the selection of 
sparse graph structures. We will analyze this method in Section [2.2.21 The approach of |llj extends 
to non-Gaussian models earlier work by Meinshausen and Biihlmann [19J. Let us notice in passing 
that the case of Gaussian graphical models is substantially easier since the log-likelihood of a given 
model can be evaluated easily in this case [20] . 

A short version of this paper was presented at the 2009 Neural Information Processing Systems 
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symposium. Since then, at least two groups explored the challenges put forward in our work. Anand- 
kumar, Tan and Willsky [21] prove that, for sequences of random graphs which are sparse on average 
(i.e. with bounded average degree), structural learning is possible throughout the correlation decay 
regime < ^crit- This result generalizes our analysis of random regular graphs (see next section), 
to the more challenging case of graphs with random degrees. Cocco and Monasson [22] proposed 
and 'adaptive cluster' heuristics and demonstrated empirically good performances for specific graph 
families, also for 9 > ^crit- A mathematical analysis of their approach is lacking. 



2 Results 

2.1 The simple thresholding algorithm 

In order to illustrate the interplay between graph structure, sample complexity and interaction 
strength 6, it is instructive to consider a simple example. The thresholding algorithm reconstructs 
G by thresholding the empirical correlations 

i=i 

for i,j G V. 

Thresholding ( samples {a;^*^}, threshold r ) 

1: Compute the empirical correlations {Cij}(^ij)^YxY; 

2: For each (i, j) eVxV 

3: IfCij > T, set G E; 



We will denote this algorithm by Thr(r). Notice that its complexity is dominated by the com- 
putation of the empirical correlations, i.e. XThr(T) = 0{p'^n). The sample complexity nxhr(T) can be 
bounded for specific classes of graphs as follows (for proofs see Section 1^2]) . 

Theorem 2.1. If G is a tree, and t{6) = (tanh6' + tanh^ 6')/2, then 



Theorem 2.2. If G has maximum degree A > 1 and if 6 < atanh(l/(2A)) then there exists t = t{6) 
such that 

S2 2n 

Further, the choice t{9) = (tanh0 + (l/2A))/2 achieves this bound. 

Theorem 2.3. There exists a numerical constant K such that the following is true. If A > 3 and 
6 > K/A, there are graphs of bounded degree A such that for anyr, 'raThr(T) = oo, i.e. the thresholding 
algorithm always fails with high probability. 

These results confirm the idea that the failure of low-complexity algorithms is related to long- 
range correlations in the underlying graphical model. If the graph G is a tree, then correlations 
between far apart variables Xj, xj decay exponentially with the distance between vertices i, j. Hence 
trees can be learnt from O(logp) samples irrespectively of their topology and maximum degree 
(assuming 9 ^ oo). The same happens on bounded-degree graphs if 9 < const. /A. However, for 
6 > const. /A, there exists families of bounded degree graphs with long-range correlations. 
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2.2 More sophisticated algorithms 

In this section we characterize XMgiG,0) and n/\\g{G,9) for more advanced algorithms. We again 
obtain very distinct behaviors of these algorithms depending on the strength of correlations. We 
focus on two type of algorithms and only include the proof of our most challenging result, Theorem 
12.81 (for the proof see Section 14. 3p . 

In the following we denote by di the neighborhood of a node i £ G {i ^ di), and assume the 
degree to be bounded: \di\ < A. 

2.2.1 Local Independence Test 

A recurring approach to structural learning consists in exploiting the conditional independence struc- 
ture encoded by the graph [9l [TOl [231 [2l] . 

Let us consider, to be definite, the approach of [lOj, specializing it to the model ([1]). Fix a vertex 
r, whose neighborhood we want to reconstruct, and consider the conditional distribution of Xr given 
its neighbors ficeixrlsLdr)- change of Xi, i € dr, produces a change in this distribution which 
is bounded away from 0. Let U he a candidate neighborhood, and assume U C dr. Then changing 
the value of xj, j £ U will produce a noticeable change in the marginal of Xr, even if we condition 
on the remaining values in U and in any W, \W\ < A. On the other hand, if C/ ^ dr, then it is 
possible to find W (with \W\ < A) and a node i £ U such that, changing its value after fixing all 
other values m U L)W will produce no noticeable change in the conditional marginal. (Just choose 
i G U\dr and W = dr\U). This procedure allows us to distinguish subsets of dr from other sets of 
vertices, thus motivating the following algorithm. 

Local Independence Test( samples {x^^^}, thresholds (e,7) ) 
1: Select a node r £ V; 

2: Set as its neighborhood the largest candidate neighbor U of 

size at most A for which the score function Score(?7) > e/2; 
3: Repeat for all nodes r £ V; 

The score function Score( • ) depends on {{x^^^}. A, 7) and is defined as follows, 

min max \Fn,G,e{^i = Xi\2Lw = ^w,X_u = ^u)- 

IPn,G,e{^i = Xi\X_^r = x^f,Xjj\j = xjj\j,Xj = Xj}\ . (11) 
In the minimum, \ W\ < A and j £ U. In the maximum, the values must be such that 

'^n,G,e{^w = xw,X_u\j = ^u\j^^j = ^j} > 7/2 (12) 

^n,G,e is the empirical distribution calculated from the samples {x^^^}"^^. We denote this algorithm 
by lnd(e,7). The search over candidate neighbors U, the search for minima and maxima in the 
computation of the Score(J7) and the computation of Pn,G,e all contribute for X\nd{G,0). 
Both theorems that follow are consequences of the analysis of jlQ], hence omitted. 



^If a is a vector and i? is a set of indices then we denote by the vector formed by the components of a with index 
in R. 
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Theorem 2.4. Let G be a graph of bounded degree A > 1. For every 9 there exists (e,7), and a 
numerical constant K, such that 

lOOA 2p 

'^lnd(e,7) (G, 0) < log y , (13) 

Xlnd(e,7)(G',0)<i^(2p)2'^+llogp. (14) 

More specifically, one can take e = 3 sinh(20), 7 = e~^^^ 2~^'^. 

This first result implies in particular that G can be reconstructed with polynomial complexity 
for any bounded A. However, the degree of such polynomial is pretty high and non-uniform in A. 
This makes the above approach impractical. 

A way out was proposed in [10]. The idea is to identify a set of 'potential neighbors' of vertex r 
via thresholding: 

B{r) = {i<^V ■Cri> k/2} . (15) 

For each node r G we evaluate Score(C/) by restricting the minimum in Eq. (jlip over W C B{r), 
and search only over U C B[r). We call this algorithm lndD(e,7, k). The basic intuition here is that 
Gri decreases rapidly with the graph distance between vertices r and i. As mentioned above, this is 
true at low temperature. 

Theorem 2.5. Let G be a graph of bounded degree A > 1. Assume that 9 < K/A for some small 
enough constant K. Then there exists e, 7, k such that 

nindD{.,7A)(G', 0) < 8(/^2 + 8^) log ^ , (16) 
XindD(„„.) (G, 9) < K'pA^'-^ + K'Ap^ logp . (17) 
More specifically, we can take k = tanh^, ^ — J sinh(26') and 7 = e~^^^ 2~^^. 

2.2.2 Regularized Pseudo-Likelihoods 

A different approach to the learning problem consists in maximizing an appropriate empirical likeli- 
hood function [Tl] [251 IMl [271 US E^- In order to control statistical fluctuations, and select sparse 
graphs, a regularization term is often added to the cost function. 

As a specific low complexity implementation of this idea, we consider the £i-regularized pseudo- 
likelihood method of [11] . For each node r, the following likelihood function is considered 

n 

L(^;{xW}) = -- J^logP,,G,e(4'^l4t^) (18) 

e=i 

where = Xy\^ = {xi : i £ V \ r} is the vector of all variables except Xr and P^.e is defined from 
the following extension of ([T]), 

/xG,e(x) = ^ n ^'""'"^ (19) 

where 9 = {%}ijev is a vector of real parameters. Model ([T|) corresponds to 9ij = 0, y{i,j) ^ E 
and 9ij = 9, V(i,j) e E. 

The function L{9; {x^^^}) depends only on . = {9rj, j € dr} and is used to estimate the 
neighborhood of each node by the following algorithm, Rlr(A), 
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Regularized Logistic Regression( samples {x*-^^}, regularization (A)) 
1: Select a node r £ V; 

2: Calculate = arg min {L(^^ ; {x^^H) + A||^„ ||i}; 

3: U 9rj > 0, set {r,j) e E; 

Our first result shows that Rlr(A) indeed reconstructs G if is sufficiently small. 

Theorem 2.6. There exists numerical constants Ki, K2, K^, such that the following is true. Let G 
he a graph with degree hounded hy A>3. If < Ki/A, then there exist A such that 

nRi.(A)(G,e)<K2r2Alog^. (20) 

Further, the above holds with A = A^^/^. 

This theorem is proved by noting that for 9 < Ki/A correlations decay exponentially, which 
makes all conditions in Theorem 1 of [11] (denoted there by Al and A2) hold, and then computing 
the probability of success as a function of n with slightly more care. The details of the proof are 
written in Appendix [Bl 

In order to prove a converse to the above result, we need to make some assumptions on A. 

Definition 2.7. Given 9 > 0, we say that A is reasonable for that value of 9 if the following 
conditions hold: (i) Rlr(A) is successful with probahility larger than 1/2 on any star graphs (a graph 
composed hy a vertex r connected to A neighbors, plus isolated vertices) if n is chosen sufficiently 
high; [ii) A < 5{n) for some sequence 6{n) 4, 0. 

In other words, assumption (i) requires the algorithm to be successful on a particularly simple 
class of graphs, and hence does not entail any loss of generality. Assumption (ii) encodes instead 
the standard way of scaling regularization terms, by letting them vanish as the number of samples 
increases. This is necessary in order to get asymptotic consistency of the parameter values 9ij. With 
these assumptions we can state the following converse theorem, whose proof is deferred to Section 

Theorem 2.8. There exists a numerical constant K such that the following happens. If 9 > 
K/A,A > 3, then there exists graphs G of degree hounded hy A such that for all reasonable X, 
iT'R\r(\)iG) = 00, i.e. regularized logistic regression fails with high probability. 

The graphs for which regularized logistic regression fails are not contrived examples. Indeed, as 
part of the proof of Theorem 12. 8^ and as proved in Appendix (Dj we have the following facts about 
Rlr(A): 

• If G is a tree, then Rlr(A) recover G with high probability for any 9 (for a suitable A); 

• For every graph Gp in the family described in Section [1.1) Rlr(A) fails with high probability for 
9 large enough and for all A; 

• If G is sampled uniformly from the ensemble of regular graphs Rlr(A) fails with high probability 
for 9 large enough and A 'reasonable'; 

• if G is a large two dimensional grid It fails with high probability for 9 large enough and A 
'reasonable'. 
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We note here that Theorem 12.81 relies on proving that a so-called 'Incoherence condition' is 
necessary for Rlr to successfully reconstruct G. Although a similar result was proven in [29] for 
model selection using the Lasso, this paper is the first to prove that a similar Incoherence condition 
is also necessary when the underlying model is the Ising model. 

The intuition behind this is quite simple. Begin by noticing that when n — )• oo, and under the 
restriction that A — )• 0, solutions given by Rlr converge to ^* as n — )• oo [11) . Hence, for large n, 
we can expand L in a quadratic function centered around ff' plus a small stochastic error term. 
Consequently, when adding the regularization term to L, we obtain cost function analogous to the 
Lasso plus an error term that needs to be controlled. The study of the dominating contribution leads 
to the incoherence condition. 

In general there are no practical ways to evaluate the incoherence condition for a given graphical 
model. This requires in fact to compute expectations with respect to the Ising distribution. As 
discussed above, this is hard for \9\ > 0uniq(A). Hence this condition was not checked for families of 
graphs. A large part of our technical contribution consists indeed in filling this gap. To this end, we 
use tools from mathematical statistical mechanics, namely low temperature series for Ising models 
on grids [30l[31], and local weak convergence results for Ising models on random graphs [32| [33]. 

3 Numerical experiments 

In order to explore the practical relevance of the above results, we carried out extensive numerical 
simulations using the regularized logistic regression algorithm Rlr(A). Among other learning algo- 
rithms, Rlr(A) strikes a good balance of complexity and performance. Samples from the Ising model 
([1]) where generated using Gibbs sampling (a.k.a. Glauber dynamics). Mixing time can be very large 

for e > 

^uniqj and was estimated using the time required for the overall bias to change sign (this is a 
quite conservative estimate at low temperature). Generating the samples {x^^^} was indeed the bulk 
of our computational effort and took about 50 days CPU time on Pentium Dual Core processors. 
Notice that Rlr(A) had been tested in [llj only on tree graphs G, or in the weakly coupled regime 
9 < ^uniq- In these cases sampling from the Ising model is easy, but structural learning is also 
intrinsically easier. 

Figure reports the success probability of Rlr(A) when applied to random subgraphs of a 7 x 7 
two-dimensional grid. Each such graphs was obtained by removing each edge independently with 
probability p = 0.3. Success probability was estimated by applying Rlr(A) to each vertex of 8 graphs 
(thus averaging over 392 runs of Rlr(A)), using n = 4500 samples. We scaled the regularization 
parameter as A = 2Ao^(logp/n)^/^ (this choice is motivated by the algorithm analysis [11] and is 
empirically the most satisfactory), and searched over Aq- 

The data clearly illustrate the phenomenon discussed in the previous pages. Despite the large 
number of samples n ^ logp, when 6 crosses a threshold, the algorithm starts performing poorly 
irrespective of A. Intriguingly, this threshold is not far from the critical point of the Ising model on 
a randomly diluted grid dcnt{p = 0.3) « 0.7 [HljlHS]. 

Figure [3] presents similar data when G is a uniformly random graph of degree A = 4, over p = 50 
vertices. The evolution of the success probability with n clearly shows a dichotomy. When 6 is below 
a threshold, a small number of samples is sufficient to reconstruct G with high probability. Above 
the threshold even n = 10^ samples are to few. In this case we can predict the threshold analytically, 
cf. Lemma 14.31 below, and get 0thr(^ = 4) ~ 0.4203, which compares favorably with the data. 
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Figure 2: Learning random subgraphs of a 7 x 7 (p = 49) two-dimensional grid from n = 4500 
Ising models samples, using regularized logistic regression. Left: success probability as a function 
of the model parameter 6 and of the regularization parameter Aq (darker corresponds to highest 
probability). Right: the same data plotted for several choices of A versus 9. The vertical line 
corresponds to the model critical temperature. The thick line is an envelope of the curves obtained 
for different A, and should correspond to optimal regularization. 



1.2 I 1 1 1 1 1 1.2 




e 



Figure 3: Learning uniformly random graphs of degree A = 4 from Ising models samples, using 
regularized logistic regression. Left: success probability as a function of the number of samples n 
for several values of 6. Dotted: 9 = 0.10, 0.15, 0.20, 0.35, 0.40 (in all these cases 6 < 6'thr(A = 4)). 
Dashed: 9 = 0.45, 0.50, 0.55, 0.60, 0.65 (61 

> ^thr(4), some of these are indistinguishable from the 
axis). Right: the same data plotted for several choices of A versus 9 as in Fig. [2l right panel. 



11 



4 Proofs 



4.1 Notation and important remarks 

Before proceeding it is convenient to introduce some notation and make some important remarks. If 
y is a matrix and R is an index set then Vr denotes the vector formed by all entries whose index lies 
in R and similarly, if M is a matrix and R, P are index sets then M/j p denotes the submatrix with 
row indices in R and column indices in P. As before, we let r be the vertex whose neighborhood 
we are trying to reconstruct and define S = dr and S'^ = V \ dr[J r. Since the cost function 
L(^; {x^^)}) + A||^||i only depends on 9_ through its components . = {^rj}, we will hereafter neglect 
all the other parameters and write ^ as a shorthand of 0_j. .. 

Let z* be a subgradient of ||^||i evaluated at the true parameters values, 9^ = {6rj ■ Oij = 0, Vj ^ 
dr,9rj = 0, Vj S dr}. Let 9 be the parameter estimate returned by Rlr(A) when the number of 
samples is n. Note that, since we assumed 9* > 0, we have 9*g > and hence Zg = 1. Define 
g"(^; to be the Hessian of L{9; {x^}) and Q{9) = lim„^oo {x^}). By the law of large 

numbers Q{9) exists a.s. and is the Hessian of Ec^g log F'cel-'^rl^yr) where Eg,9 is the expectation 
with respect to (|19p and X is a random variable distributed according to (jl9p . It is convenient to 
recall here the expressions for the Hessian and gradient of L for finite n and in the limit when n — )• cxd. 
For all i,j £ V\{r} we have, 

1=1 COSn \l^f^v\{r} ^rtXt j 

Qm = { ^2.J^'^' , ^ ) ' (22) 

Vcosh {Etev\{r}^rtXt) J 

1 " 

[VL-(£)], = l5;xf (tanh( 0,,xf))-xW), (23) 

^ e=i tev\{r} 

[VL(£)]i =EG,r{^*tanh( e^Xt) } - E^r {^^^r}- (24) 

t£V\{r} 

Note that from the last expression it follows that 'VL(9*) = 0. 

We will denote the maximum and minimum eigenvalue of a symmetric matrix M by CTuiaxiM) 
and (TrainiM) respectively. Recall that ||M||oo = maxj Ylj 

We will omit arguments whenever clear from the context. Any quantity evaluated at the true 
parameter values will be represented with a *, e.g. Q* = Qi9*). Quantities under a A depend on n. 
When clear from the context and since all the examples that we work on have 9ij £ {0, ^ }, we will 
write ^G,e ^ ^G,e even simply E. Similarly, P^e will be sometimes written as simply Pfj^g or just 
P. A subscript n under ^G,9y i-e. Pn,G,e) will be introduced to denote the product measure formed 
by n copies of model ([19]). Through out this section Pgucc will denote the probability of success of 
a given algorithm, that is, the probability that the algorithm is able to recover the underlying G 
exactly. 

Throughout this section G is a graph of maximum degree A. 

4.2 Simple Thresholding 

In the following we let Cij = Kq g{XiXj} where expectation is taken with respect to the Ising model 
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Proof. (Theorem 12.11 ) If G is a tree then Cij = tanh 9 for all (ij) G E and Cij < tanh^ 9 for all 
{ij) ^ E. To see this notice that only paths that connect i to j contribute to Cij and given that 
G is a tree there is only one such path and its length is exactly 1 if G E and at least 2 when 
^ E. The probability that Thr(T) fails is 

1 - Psucc = ^n,G,e{Cij < T for some e E or Cij > r for some ^ E} . (25) 

Let T = (tanh^ + tanh^ 0)/2. Applying Azuma-Hoeffding inequality to Cij = ^Yll=i^i^^^f^ 
have that if £ E then, 

IPn,G,e(Q, < r) = P„,G,e (^p^'^xf^ - Cij) < n{T - tanh^)^ < g-^nCtanhe-tanh^ e)^ (26) 
and if ^ E then similarly, 

ff'n,G,e(a, > r) = P„,G,e - C'*.) ^ - t^^h' ^ e-^"(*-^^-*-i^'^)'. (27) 

Applying union bound over the two possibilities, £ E or ^ E, and over the edges {\E\ < 
p /2), we can bound 

succ by 

Psucc > 1 _p2g-^n(tanhe-tanh2e)2 _ (28) 

Imposing the right hand side to be larger than 6 proves our result. □ 

Proof. (Theorem EJ]) We will prove that, for 9 < arctanh(l/(2A)), dj > tanh 6* for all G E 
and Cij < 1/(2A) for all (ij) ^ E. In particular dj < Cm for all ^ E and ah {k,l) G E . The 
theorem follows from this fact via union bound and Azuma-Hoeffding inequality as in the proof of 
Theorem 12. 1[ 

The bound Cij > tanh^ for (ij) £ E is a direct consequence of Griffiths inequality [36] : compare 
the expectation of XiXj in G with the same expectation in the graph that only includes edge (i, j). 

The second bound is derived using the technique of [35], i-e., bound Cij by the generating function 
for self-avoiding walks on the graphs from i to j. More precisely, assume I = dist(i, j) and denote by 
Nij{k) the number of self avoiding walks of length k between i and j on G. Then [35] proves that 

< f (tanh^)^Ar,,(^) < f A^-i(tanh ^)^ < < ^(^J^ML. (29) 

- ; »n;-Z^ V J - i_/^tanh6' -1-Atanh6l ^ ^ 

k=l n=l 

li 9 < arctanh(l/(2A)) the above implies Cij < 1/(2A) which is our claim. □ 

Proof. (Theorem 12. 3p The theorem is proved by constructing G as follows: sample a uniformly 
random regular graph of degree A over the p — 2 vertices {1, 2, ... ,p — 2} = \p — 2]. Add an extra 
edge between nodes p — 1 and p. The resulting graph is not connected. We claim that for 9 > K/ A 
and with probability converging to 1 as p — >• c«, there exist i,j G [p — 2] such that ^ E and 

Cij > Cp-i-p. As a consequence, thresholding fails. 

Obviously Cp-i^p = tanh^. Choose i £ [p — 2] uniformly at random, and j a node at a fixed 
distance t from i. We can compute Cij as p — t- 00 using the same local weak convergence result as in 
the proof of Lemma 14.31 Namely, Cij converges to the correlation between the root and a leaf node 
in the tree Ising model ()45p . In particular one can show, [33], that 

lim di > mi9f , (30) 

where m{9) = tanh(A/i*/(A — 1)) and h* is the unique positive solution of Eq. (j46p . 

The proof is completed by showing that tanh^ < m{9)^ for all 9 > K/ A. □ 
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4.3 Proof of Theorem 12.8^ failure of regularized logistic regression 



In order to prove Theorem 12 .81 we need a few auxiliary results. Our first auxiliary results establishes 
that, if A is small, then \\Q*scsQ*ss^^*s\\ 

oo > 1 is a sufficient condition for the failure of Rlr(A). We 
recall here that the subgradient of ||^||i evaluated at ^*,that is z* , satisfies Zg = 1. 

Lemma 4.1. Assume [Q*scsQ*ss^^ zt;]i > 1 + e for some e > and some row i €z V , crj^i^{Q*gg) > 
Cmin > 0, and A < C^j^e/(2'^(1 + e^)A^). Then the success probability of Rlr(A) is upper bounded as 

Psucc < 4A2e-"''A + 4A e-"^'^fl (31) 

where 6a = {ClJ32A)e and 5b = (Cmm/64VA)e. 

The next Lemma implies that, for A to be 'reasonable' (in the sense introduced in Section [2. 2. 2p . 
nX^ must be unbounded with respect to p. In fact, by this lemma, if we choose n to be very large 
and choose a sequence of star graphs of increasing number of nodes but with only one edge between 
the central node and the remaining nodes, then, unless K is increasing with p, Rlr(A) will fail to 
reconstruct the graph with a probability greater than 1/2, which is a contradiction if A is 'reasonable'. 

Lemma 4.2. There exist M = M{K,9) > decreasing with K for 6 > 
true: If G is the (star) graph with vertex set V = [p] and edge set E = 
and n}? < K, then 

p ^ -M{K,e)p , -n(l-tanh 9)2/32 



such that the following is 
{{r,i)} (e.g. r = 1, i = 2) 

(32) 



Finally, our key result shows that the condition \\Q*scsQ*ss^ -^slloo < 1 is violated with high 
probability for large random graphs. The proof of this result relies on a local weak convergence 
result for ferromagnetic Ising models on random graphs proved in |32j . 

Lemma 4.3. Let G be a uniformly random regular graph of degree A > 3. Then, there exists 0thr(A) 
such that, for 9 > 0thr(A), \\Q*scsQ*ss^^ Zg\\ao > l + e{6,A) with probability converging to 1 as p ^ oo 
(e{e, A) > and e(6'. A) as 9 ^ oc). 

Furthermore, for large A, 0thr(A) = 9 A^^{1 + o{l)). The constant 9 is given by 9 = and h^a 
is the unique positive solution of 

/loo tanh /loo = 1- (33) 

Finally, there exist Cmin > dependent only on A and 9 such that amin{Q*ss) ^ C'min with probability 
converging to 1 as p ^ oo. 

The proofs of Lemmas 14.11 14.21 and 14.31 are sketched in the next subsection. 

Proof. (Theorem 12. 8p Fix A > 3, ^ > K/A (where is a large enough constant independent of A), 
and e, Cmin > and both small enough. By Lemma 14. 3| for any p large enough we can choose a 
A-regular graph Gp = {V = \p],Ep) and vertex r G V such that \Q*s'=sQ*ss~^ ^s\i ^ 1 + e for some 
i GV \ r (Indeed most vertices r and graphs Gp will work) . 

By Theorem 1 in flO\ we can assume without loss of generality n > K'Alogp for some small 
constant K' . Further by Lemma 14.21 ^-^^ ^ P{p) for some F(j)) oo as p ^ oo and the condition 
of Lemma 14.11 on A is satisfied since by the assumption that A is 'reasonable' we have A — )• as 
n — )• oo. Using these results in Eq. (131 h of Lemma 14.11 we get the following upper bound on the 
success probability 

Psucc(Gp) < 4A V^-' ^'^ + 2A e~^^P^^B . (34) 
In particular Psucc (Cp) — as p — 00. □ 



14 



4.3.1 Proofs of auxiliary lemmas 

Proof. (Lemma I4.ip This proof follows closely the proof of Proposition 1 in [11] . For a matter of 
clarify of exposition we will include all the steps, even if these do not differ from the exposition done 
in [H]- 

We will show that (under the assumptions of the Lemma on the Incoherence Condition, o'rainiQ*ss) 
and A) if ^ = (^5,^50) = (^5,0) with ^5 > then the probability that Rlr(A) returns 9 is upper 
bounded as in Eq. (j3ip . More specifically, we will show that this 9 will not satisfy the stationarity 
condition 'VL{9) + Az = with high probability for any subgradient z of the function \\9\\i at 9. 

To simplify notation we will omit {x^^^} in all the expressions involving and derived from L. 

Assume the event 'VL{9) + Xz = holds for some 9 as specified above. An application of the 
mean value theorem yields 



V'L{9*)[9 - 9*] = W'^ -\z-R^ , 



(35) 



where W = -VL{9*) and = [V^L{9^^'') - V'^L{9*)]J (9- 9*) with 9^^^ a point in the line from 

^ to Notice that by definition V^L{9*) = Q"* = Q'^{9^). To simphfy notation we will omit the * 
in all Q"*. All in this proof are thus evaluated at 

Breaking this expression into its S and components and since 9_gc = 9*gc = we can write 



(36) 
(37) 

(38) 



Q'^scsiG-s - 0*s) = W^c - Xzsc + i?gc, 
Q^ssih-is) = WS-Xzs + R'^s- 
Eliminating 9g — 9*^ from the two expressions we obtain 

mc - i2Sa] - Q''sCsiQss)-'m - Rs] + >^QsCsiQssr'^S = X^c 
Now notice that Q'^gc siQ'ss)'^ = Ti + Ta + Tg + r4 where 

Ti = Q^csliQ'ss)'^ - {QlssT^] 1 ^2 = - Q*sc 

= [Q'sc s ~ Q*sc s\\.^Q'ss) ^ ~ {Q*ss) 1 '^i = Q*sCsQ*ss ^• 

Recalling that Z5 = 1 and using the above decomposition we can lower bound the absolute value of 
the indexed-i component of Zgc by 

\zi\'>\\[Q*sC sQ*SS ^%]i||oo — — ||72,i||l ~ ll^3,j||l (39) 



SS 









A 




A 



WiQscsiQss) 



A 



+ 



A 



We will now assume that the samples {x^^^} are such that the following event holds (notice that 



Si 



{\\Q 



SVJ{i} s 



Q*s 



SU{i} SWoo 



< a, 



A 



< 



(40) 



where U = C^i,e/(8A) and = C^me/ilGVA). 

Prom relations to in Section O we know that EG,e((5") = Q*, EceiW"") = and 
that both Q" — Q* and are sums i.i.d. random variables bounded by 2. From this, a simple 
application of Azuma-Hoeffding inequality yields @. 



VG,e(|Q^--Qy >5) <2e- — , 
IPn,G,e(|W^;}| X^) <2e- — , 



(41) 
(42) 



*For full details see the proof of Lemma 2 and the discussion following Lemma 6 in [11] 
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for all i and j. Applying union bound we conclude that the event £{ holds with probability at least 



1 - 2A(A + l)e 



nil 



2{A + l)e' 



> 1 - AA'^e- 



4Ae 



(43) 



where 5a = C^i„e/(32A) and Sb = C„,ine/(64^/A). 

If the event holds then (Tmm(Q55') > (^mm{Q*ss)-Cmin/'2 > C'min/2. Since \\[Q'^csiQss 
\\Qss~%\\QU2 and < mj we can write 1^^03(^33)'']'^ 
lower bound to 



< 



> \\[Q*sCsQ*SS ^^s] 



A 



A 



ill 00 ||^l,j||l 



mm 



A 



^2,2 111 

+ 



< 2\/A/Cmm and simplify our 

(44) 



A 



The proof is completed by showing that the event Si and the assumptions of the theorem imply 
that each of last 7 terms in this expression is smaller than e/8. Since \ [Q*gc sQ*ss~^'^ ^ 1 + ^ by 
assumption, this implies \zi\ > l + e/8 > 1 which cannot be true since any subgradient of the 1-norm 
has components of magnitude at most 1. 

Taking into account that CTininiQ*ss) — niaxjj Q*^ < 1 and that A > 1, the last condition on £{ 
immediately bounds all terms involving by e/8. Some straightforward manipulations imply (see 
Lemma 7 from [llj for a similar computation) 



IT1 



< 



A 



WQss-Q 



f<2 
min 

2A 

T3,i\\l < ■^2~\\QsS 



SSWco , 



< 



Q 



SSWoo 



s^s 



II [Q 



s^s 



Qs 



SCs\^\\oo ) 



and thus, again making use of the fact that (Tjam{Q*ss) ^ all will be bounded by e/8 when £i holds. 
The final step of the proof consists in showing that if £i holds and A satisfies the condition given in 
the Lemma enunciation then the terms involving i?" will also be bounded above by e/8. The details 
of this calculation are included in Appendix IC.ll □ 



Proof. (Lemma 14.31 ) Let us state explicitly the local weak convergence result mentioned in Sec 
right before our statement of Lemma 14.31 For t G W, let T{t) = {Vt,Ej) be the regular rooted tree 
of degree A of t generations and define the associated Ising measure as 



MT,e(^) 



1 



n 



n 



Here 9T(t) is the set of leaves of T(t) and h* is the unique positive solution of 

/i = (A — 1) atanh {tanh tanh K} . 



(45) 



(46) 



It was proved in [33] that non-trivial local expectations with respect to fJ-cdi-S.) converge to local 
expectations with respect to /ijg(x), as p — >• 00. 

More precisely, let Br{t) denote a ball of radius t around node r ^ G (the node whose neighbor- 
hood we are trying to reconstruct). For any fixed t, the probability that Br{t) is not isomorphic to 
T(i) goes to as p — )■ cxD. 
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Let 5(^Br(t)) function of the variables in Br{t) such that 5(^6^(4)) ~ 9i~3±Br{t))- Then 

almost surely over graph sequences Gp of uniformly random regular graphs with p nodes (expectations 
here are taken with respect to the measures ([T]) and (j45p ) 

hm EG,9{9{Xs^(t))} = ^mA+{9{Xj^t))} ■ (47) 
Notice that this characterizes expectations completely since if (/(xg^.^j-j) = — 5'(— ^Br(i)) then, 

^GA9iXs^it))} = 0. (48) 

The proof consists in considering [Q*scsQ*ss~^ ^s^i ^'^^ * ~ dist(r, i) bounded. We then write {Q*ss)ik = 
^G,e{9i,k{Ks^^,^)} and {Q*scs)ii = ^G,e{9iAX.g^^t)^} some functions g.,.{K^^^^^) and apply the weak 
convergence result (jT7|) to these expectations. We thus reduced the calculation of [Q*scsQ*ss~^ ^s^i ^o 
the calculation of expectations with respect to the tree measure (I45p . The latter can be implemented 
explicitly through a recursive procedure, with simplifications arising thanks to the tree symmetry 
and by taking t ^ 1. The actual calculations consist in a (very) long exercise in calculus and is 
deferred to Appendix IC.3[ 

The lower bound on cr^[n{Q*gg) is proved by a similar calculation. □ 
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A Covariance calculation for the toy example 



In this section we compute the covariance matrix for the Ising model on the graph Gp introduced 
within the toy example of Section [TTTl see Fig. [H In fact we only need to compute '&Gj,fi{xiX2} , 
"KCpfii^ix^} and 'EGj,fi{xzXi}, since all other covariances reduce to one of these tree by symmetry. 
First recall that by [35j we can write the correlation between Xi and Xj as follows 

EFa(G)(tanhO)l''l 

where: (i) X{G) is the set of all subsets of edges of graphs of G with odd number of edges adjacent 
to node i and j and even number of edges adjacent to every other node; {ii) V{G) is the set of all 
subsets of edges of G with even number of edges in all nodes; (iii) \F\ is the number of edges in F. 

Expression ()49p implies three basic facts that we will use to compute the correlations of Gp. Some 
of these observations can be proved in different and maybe simpler ways but for a matter of unity, 
we will explain them from the point of view of ()49p . 

First, if i,j are two nodes in a graph G and k, I two nodes in a graph G' and we 'glue' j and k 
together (i.e. we fix xj = Xk) to form a new graph G" (see Figured] (a)) then 

^G",e{xiXi} = EG,e{xiXj} EceixkXi}. (50) 

Second, if instead we 'glue' i with k and j with / (i.e. we fix Xi = x^ and xj = xi) (see Figured] 
(b)) then 



EcejxjXj} + EG',e{xkXi} 
+ EG,e{xiXj} EG',e{xkXi} 
tanh(arctanh(EG' ejxjXj}) + arctanh(EG/^e{xfcX/})). (52) 



l + EG oixixAEa' /)ixkxj \ 



Note that in this second case we are computing EG"fi{xiXj} and not EG'\e{xiXi}. 

Finally, if G is the square graph formed by nodes {1, 2, 3, 4} and edge set {(1, 3), (1,4), (2, 3), (2, 4)} 
and G' is some other graph to which nodes i and j belong and we 'glue' node 1 with i and node 2 
with j (i.e. xi = Xi and X2 = xj) to form G" (see Figured] (c)) then 

Itanh^e + lEGoeixixAtanli^e , ^ 

Eq^-, e\X'j,xA = — ; T^. (53) 

(-i2,»i 4j i + tanh^& + 2EG2,0{xiXj}tanh2 ^ ' 

With these three relationships we can quickly compute Eg'p^6i{j;iX2}, EGpfi{x\x-i\ and EGpfi{x'iX/^\. 
Let p = 3 and note that from (I50p we have that EGo_fi{x\X2\ = tanh^ Q. Since Gp is formed by p — 2 
copies of G3 glued in 'parallel' in between nodes 1 and 2, by ([52]) we have that EGpfi{x\X2\ = 
tanh((p — 2) arctanh(tanh^ 0)). Now notice that Gp can also be seen as a single edge connecting 1 
and 3 in 'parallel' with the graph formed by connecting in 'series' the edge (2, 3) to a copy of Gp-i. 
This tells us that EGpfiixix^} = tanh(6' + arctanh(EG'p_-^^6i{xiX2} tanh 0)). Finally, we can also see 
Gp as a square graph formed by nodes {1,2,3,4} and edges {(1,3), (1,4), (2,3), (2,4)} to which we 
add Gp-2 as a 'bridge' in between nodes 1 and 2. Making use of (f53]l we get that 

GpA^sx^l - ^ ^ ^^^^A Q ^ 2EGp_^,e{xiX2] tanh^ 6' ^ ' 

From these closed form expressions it is now easy to obtain the behavior of the correlations for the 
regime 9^1 and p <C 1. 
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(a) 'Series' composition 



(b) 'Parallel' composi- 
tion 



(c) 'Bridge' composi- 
tion 



Figure 4: Correlation for different composite graphs 



B Success of regularized logistic regression for small 6 

Proof. (Tlieorem 12.61 ) The proof of this theorem consists in verifying the conditions of Theorem 1 
in [H] (denoted there by Al and A2) and computing the probability of success as a function of n 
with slightly more care. 

In what follows, Cmin is a lower bound for ammiQ^s) ^^'^ -^maxEl is an upper bound for 

arn.A^G,e*{XsX^)). (55) 

We define 1 — a = \\Q*gc sQ*ss~^\\<xi aiid ^min is the minimum absolute value of the components of 

r . Throughout this proof we will have C_ and denote a_(Q-,,) and a_ (i Er=i -f-^'^l) 

respectively and 1 — a = \\Q^csQ^s~^\\°°- 

Consider the event, £, that the following conditions hold (these conditions are part of the condi- 
tions required for Theorem 1 in to be applicable and are labeled by the names of the theorems 



that use them that help proving Theorem 1), 

In Lemma 5: HQg^ - QJ^Ib < Cmin/2 , (56) 

In Lemma 6: for Tl (TmmiQ'^ls) < ^111111/2 , (57) 

forTl ||Qg^-Q*^^||^ < l^C^i„/^/A, (58) 

l2 i — a 

CM I — 

for T2 WQlcs - Q5C5II00 < -Cmin/\^, (59) 
for T3 ||Q^C5 - Q^c^lloo < , (60) 

In Lemma 7: a^in(Q"5s) < Gmin/2 and WQls - Q*ssh < a/I %= > (61) 

V o 2V A 



^It is easy to prove that Cmin < i3j 
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In Proposition 1: ^ — r— — r , (62) 

A 4(2 — a) 

AA < — ¥^ , (63) 

^ AVa<^. (64) 



Note that these conditions imply that Cmin > C'min/2, Draa.x < 2L'niax and also, from the proof of 
Proposition 1 in |11) . they imply that without loss of generality we can assume a = a/2. Since the 
assumption (TYain{Q^*ss) — C'min/2 follows from the assumption of Lemma 5 in [11], all assumptions 
are in fact assumptions on the proximity, under different norms, of empirical vectors and matrices 
to their correspondent mean values. 

Having the definition of £ in mind we beging by noting that Theorem 1 can be rewritten in the 
following form. 

Theorem B.l. If X < (Cniin/2) V(20ADmax), A < (Cniin/2)6'niin/(20\/A), I - a < 1 and £ holds 
then Rlr will not fail. 

A straightforward application of Azuma's inequality yields the following upper bound on the 
probability of these assumptions not occurring together, (the first three terms are for the conditions 
involving matrix Q and the fourth with the event dealing with matrix W), 



where 



j(2) _ n.l^/'^^n '^min l 



(66) 



= _L_^%i, (67) 




S^al = min\i^,^li\. (68) 

Under the assumption that 9 < Ki/A for Ki small enough we now calculate lower bounds for Cmin 
and a and upper bound for -Dmax which will allow us to verify the condition of Theorem IB. II and 
simplify expression for the upper bound on Fn^cfii^'^)- 

First notice that by 1^ we have Cmin = (Jmm{^G,tii^ ~ tanh^ 6'M)X5Xj)} where M = 
J2t£dr-^t- Since 9M < OA < Ki and because crmin(^-B) > c7min(^)o'min(-B) we have, Cmin ^ 
(1 - Kl)a^^r,{'&G,e:{^sXs}). Now write ¥.G,e*{XsX'^} = 1 + Q and notice that by (l29|) Q is 
a symmetric matrix whose values are non-negative and smaller than tanh^/(l — Atanh^). Since 
amm(^G,e*{XsXg}) = 1 — {—Q)v for some unit norm vector v and since, by Cauchy-Schwarz 
inequality, we have v^{—Q)v < \\v\\1 maxjj \Qij\ < Atanh0/(1 — Atanh^) < Ki/{1 — Ki), it follows 
that amini^cti^s^s}) ^ (1 - 2Ki)/(l - i^i). Consequently, Cmin > (1 + i^i)(l - 2Ki). With the 
bound ([29|) . and again for Ki small, we can write -Dmax < 1 + Atanh0/(1 — Atanh^) < (1 — Ki)~^. 
A similar calculation yields 1 — a < Ki/{{1 — Kf){l — 2Ki)). 

For Ki small enough, and looking at the bounds just obtained for Cmin and -Dmim the restriction 
on A in Theorem IB.l) namely 

A < Cmin/40^/Amin{e, Cmin/40DmaxVA}, (69) 
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can be simplified to Cmin^/40\/A. Choosing A = K^OA it is easy to see we can simplify the 
expression for the probability upper bound and write 



for some constant K2 which in turn implies the bound on nRir(A). □ 

C Failure of regularized logistic regression at large 9 
C.l Bound on terms involving i?" 

Proof. (Lemma 14. ip We outline here the upper bound on the term ii". Note that we are omitting 
the samples {x^*^} in the argument of function L and we are representing 9^^ by 6. This proof is just 
a replica and fusion of Lemmas 3 and 4 in [11]. Through out this proof we have 6gc = 6*gc = 0. 
First we write, 

= [v^L{l^^'>) - v^L{e*)]J[0 - e*] (71) 
1 " 

= - - V{0*)][^^''> ^^^^""ijll - r] (72) 

1=1 

for some point ^^"'^ lying in the line between 6_ and ^*,i.e. 6^''^ = tj9_ + (1 — tj)9^. Since rj{9) = 

g{xi'^ J2t&v\r ^rtxf ) = g{xi'^0^x^'^) = g{e^x^'^) where g{s) = 46^7(1 + e^'f another application of 
the chain rule yields, 

1 " 

R] = -Y,9'{¥'^''x^'^)x^'^^0 - r]{xf - r]} (73) 
^ 1=1 

= - y{9'iW^^x'^'^)x^p}{[l'^'^ - e*fx^'^ x«^[l - r]} (74) 

i=l 

where 9^^^ is a point in the line between and 0^ . Let 

hi ■= - e_*fx^^ - r] = tjll - e*fx^^ x(*)^[l - r] > (75) 

then, noticing that 9_gc = O^c = and \g'\ < 1 we can apply Holder's inequality to obtain, 

\R]\ < Imu < tMs-o*sf |^E4^ 4^''} ^^s-m < ms-ml (76) 

Slightly readapting the proof of Lemma 3 from [llj we now show that 



Cn,i, f I 16A2 



.-»<^ 1-Wl-A^(l+ ^ )) . (77) 



W. 



A 



Define G{u) = L{e* + u) - L{e*) + X{\\e* + u\\i - \\e*\\i). Since G(0) = and G is strictly convex 
we have that if G{u) > for ||u||2 = B then ||'u||2 < B, where u = 9 — 6^ is the unique minimum 
point of G{u). To prove (j77p we will compute a lower bound on the set of points for which G{u) > 0. 

By the mean value theorem we can write. 
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G{u) = - W^u + u^V^L{e* + au)u + X{\\e* + u\\i - \\e*\\i). (78) 
Note that W = -VL{9*). 

We now get bounds on each of the terms of the previous expression, 

iW'^^ul < \\WS\\ooVa\\u\\2, (79) 
A(||r +n||i - ||r||i) > -XVA\\u\\2 (80) 

u'V'm* + au)u > \\u\\i { f7min(QS/)-A^/^||n||2a„ax I zy'^xfxf]], 



> \\u\\ 

Thus can write, 



(c^in/2-A^/^\\u\\2). (81) 



G{n) > \\n\\2 AV2 (^-A\\u\\l + A^'/'^Wuh - A - ||T^^||oo) 

from which we derive expression ()77p 

If Si holds we can assume withi 
x,x G [0, 1] and thus we can write. 



(82) 



If Si holds we can assume without loss of generality ||^f-||oo < Now notice that 1 — \'l — x < 



16A2A2(l + e)2 



'^^■'-^V4A^ C^. j C^. • 

\ mm / mm 

If we now want that 

A|i?"| e 
AT - 8' 

^•-^mm O 

then we can simply impose that A < C^jj^e/(2''(1 + e2)A^), which finishes the proof. 

□ 

C.2 72 A^ must be unbounded with p 

Proof. (Lemma I4.2p In this proof S = {i} and S'-" = dr\{i}. 

We prove the lemma by computing a lower bound on the probability that || Vcjc-/j(^; {^£^^^}) lloo > A 
under the assumption that nA^ < K and 9_gc = and ^5 > d. This will prove the corresponding 
upper bound on the probability of success of Rlr(A). 

First we show that there exists an C{0) such that if ||^5||oo > C then with high probability Rlr(A) 
fails. 

Begin by noticing that EG,e(i.(^)) > 4'(l-tanh6') and that \L{9) -Eg,0{lCO))\ < 21og2 + 2||^||i. 
Then use Azuma's inequality to get the following bound, 

Pn,GAHi) + MliWi > m)) (85) 

= ^n,GAL{0) - ^Gfi{Lm > log 2 - All^lli - Eg,9(L(^))) (86) 

-2n(log2-Afl,^-EQ fl(L(£)))^ 
> 1 - e (2 1og2+29,;j2 ^ ^gy^ 



^The requirement 6ri > 0, necessary for correct reconstruction, allows us to ignore the ||.||oo and ||.||i in what follows. 
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If ll^slloo > C{9) for large enough C{9) then we can lower bound the previous expression by 

-2n(%)2(l-tanh.)2 -^C^ (1-tanh 6)^ -„(l-tanh 9)^ 

1 — e (2iSg2+2e~p > 1 — e 8(iog2+c)^ > 1 — e 32 . 



(88) 



Since L{9) + A||^||i > L(0) contradicts the fact that ^ is the optimal solution found by Rlr this shows 
that with high probability 9ir must be smaller than C{9). 

Under the assumption that 9ri < C and nX^ < K we will now compute a lower bound for the 
event \\VscL{9)\\oo > A. 



\,G,e{\\\^scHmoc>mi<C 



1 1 



n,G,e [ ||y-^xWxW(l-tanh(xWxf ^.O)lloo > 1 



A n 



> 1 - EG,e (^n,G,0 <VK,We 



{Ce}e=i 



(89) 
(90) 

(91) 

(92) 



where Q = xi.^\l - tanh{xP X^^^ 9ri)). 

Conditioned on {Ce}f^^ all the '^^X^^Ci are independent and identically distributed. Hence, 
choosing one particular G Sc, and defining Vi = X^, we can rewrite the previous expression as, 



^-^Gfi Wn,Gfi [^j2^iC,< 



(93) 



We now use the central limit theorem for independent nonidentical random variables to upper bound 
the conditional probability inside the expectation. It is easy to see that Lyapunov conditions hold. 
In fact, let si = X;"=i Var(V^Q|{Q}" 1) = J2%i Cj then for some (5 > 0, 



and 



EG,e(|F^Q|'+*|{Q}?=i) = IQI'""^ < 00 

1 " 

liP ^E^G,e(|v^^Q-EG,e(^^^Q|{Q}Li)p-"'l{Q}"=i) 

Sn 

1 v^,^,24-rf n. ^/2 ^1 + tanhC(0)\^+'' „ 



(94) 
(95) 



(96) 
(97) 



Thus we can write, 



n,G,e ViC, < VK\{CeYu) = ^.g,. (J^I ^ 



{QlLi ) (98) 
(99) 
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where $ is the cumulative distribution of the normal(0,l) distribution and e„ — )• with n. We can 
finally write, 



<C] > 1 - g(p-i)(i°g('^( i-ta';^cw )+^")) > 1 _ e-M/(i^,e) ^-^^^^ 



IPn,G,e(ll^V5cL(l)|U > 1 

for n big enough. In the above expression M{K,6) as K ^ oo. From this bound and (j88p we 
get the desired upper bound on the probability of success of Rlr. □ 

C.3 Random regular graphs and the violation of the incoherence condition 

Proof. (Lemma l4.3p We explain here the calculations with respect to the tree model (|45p . Throughout 
all calculations we assume that < 6 < oo. An important property that follows from the fixed point 
equation ()36|) is that, if ^(xj^j)) is a function of the variables in T(t) then 

^m,e,+{9iKm)} = ET(t+i),e,+{5(^Tw)} , (101) 

with the obvious identification of T{t) as a subtree of T(t + 1). 

Let r be a uniformly random vertex in G and i ^ j two neighbors of r. Using the local weak 
convergence property ([^7j) with t = 1 we get 



^Qis), - ' = Et,i,.,+ (jJ^). (103) 



where M = J2iedT{i) -^i °f variables on the leaves of a depth 1 tree, and i,j G i9T(l). 

For r' at distance t > 1 from r, consider the A-dimensional vector in 

lim {Q*s.s)r' = Fs{t) . (104) 

Elements of Fs{t) are of the form Ej(() g ^ g^^^'gj^^ ^ where i E 9T(1). These elements can take only 
two different values: one if r' is a child of j and other if not. We denote the first value by Fd(t) and 
the second by Fi(t). Since = 1 is an eigenvector of Q*gg with eigenvalue a + (A — 1)6 we can write, 

lim \\Q*scsQ*ss^*s\\oo = sup \A{t)\ (105) 

where 

^ Fdjt) + (A - 1)F,(0 _ E+(X,,M/cosh^(gM)) 

a + (A -1)6 E+(XiM/cosh2(^M)) ■ ^ ' 

In this expression, and through the rest of the proof, E^. will denote Ex(t'),e,+ where t' is the smallest 
value such that all the variables inside the expectation are in T(i'). Now, conditioning on the value 
of Xi {i S 9T(1)) we can write, 

E+{Xr'M/cosh^{eM)) = ci{t)+ + C2{t)-, (107) 
E+{X^M/cosh^{eM)) = C1-C2. (108) 
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where 

ci = E+(lx,=iM/cosh2(0M)), (109) 
C2 = E+(lx,=-iM/cosh2(0M)), (110) 
= E+iX,,\Xi = l), (111) 
= E+{Xr'\X, = -l). (112) 

(113) 

In the expression above the binomial coefficients are to be assume zero whenever its parameters are 
not integer values. In order to prove that the incoherence condition is violated we will now show 
that B = limt_j>oo ^it) > 1 if is large enough. Writing a first order recurrence relation for and 
it is not hard to see that, 

B — a 2(a — 1) , ^ , , 

a + p — 2 a + p — 2 

B -a 2(1 - /3) , ^ , , 

a + p — 2 a + p — 2 

where 

a = F+{Xr.>, = l\Xr' = l) = e^*+^/{e''*+^ + e~^*-^), (116) 
/3 = P+(X," = -1|X,, = -l) = e-'^*+V(e'**-'^ + e-''*+^), (117) 

and r" denotes a child of r, i.e., a node at distance t + 1 from r. Recall that h* is the unique positive 
solution of (jH]). In the above expression denotes the probability associated with the measure 
()45p where again we can restrict T to the smallest tree containing all the variables that compose the 
event whose probability we are trying to compute. Since 0<q; + /3 — l<lwe have that 

B= (118) 

a + (3-2ci-C2 ^ ^ 

A little bit of algebra allows us to write, 
/3-a ^ (l-/3)-(l-a) 
a + (3 -2 (1- a) + (1-/3) 

P+(X,." = l\Xr' = -1) - ¥+{Xr" = -l\Xr' = 1) 



(119) 
(120) 

(121) 

.(X,, = -l) + l/P+(X,. = 1) ^^^^^ 

- Ifxt : 1) I Ifx: : :!i - ^^(^)^^^^(^-) - ^-^(^^^(a - d). im 

In addition, taking into account that ci and C2 can be expressed as, 

2 A / A - 1 \ me^*™ 



F+{Xr" = l\Xr' = -l)+F+{Xr" = -1|X^/ = 1) 
P+(X^//=l,X^/=-l) _ ¥+{X^„=-l,X^,=l) 

P+(X,,=-1) P+(^,^=l) 
P+(X^//=1,X^,=-1) P+(X^,/=-l,X^/=l) 

P+(X,,=-1) + P+(X,,=1) 

1/P+(X,, = -1) - 1/P+(X,,, = 1) 



Z ^ ) cosh Om 

m=— A ^ 



ci = i? >^ A+m-2 --t:^> (124) 



2 A /A-l\ me 



„h*m 



m=— A ^ 2 ^ 



(126) 
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we have, 

, sinh mh* 



Ci + C2 Y,m=l (^±2:1) 



m- 



cosh dm 



( ^ \ Tn? cosh/i*77i 

^1 ^2 2^„=i l^Aimj^ coshem 



(127) 



Expanding everything in powers of e we get, 

lim \\Q*scsQ*ss'^\\ >B=(l- 26-2/^*^/(^-1) + ...) fl + (A - 2)6-2^^' +2 + ...) (128) 



p— >oo 

Since h* grows with this expansion proves the first part of Lemma 14.31 In fact, this expression 
shows that for large 9, as 6 increases, B decays to 1 from above. Hence, there exists a 0thr(^) such 
that for all 9 > 6'thr(^) we will have limp_>oo IIQgc^Qsslloo > 0. 

Remark C.l. It is interesting to see that the condition for B > 1 is equivalent to ci{a — 1) + 02(1 — 
/3) > 0. This implies that if B > 1 then A{t) > B and if B < 1 then A{t) < B. Hence, when 
B > 1 we have A{t) > 1 \/t and when B < 1 we have A{t) < 1 Vt. Consequently, {9 : A{t) > 
1} = {9 : B > 1} which does not depend on t. It is not hard to prove that A{t) > \/t,9 and thus, 
{9 : limp^oo \\Q*sCsQ*ss~^ Woo >1} = {9:B>1}. 

We now study how 0thr(A) scales with A for large A. Notice that i? = 1 is equivalent to 
S{9) = ci(a — 1) + €2(1 — /3) = 0. It is not hard to see that this equation has a single solution H. 
We show that if we search for solutions, 9, that scale like A-^ then in the limit when A — )• 00 we 
get an expression that exhibits a single nontrivial zero. This means that for large A the solution of 
S{9) = must be of the form 9A^^{1 + o(l)), where 9 is the solution of the scaled equation. 

First notice that when A — )■ 00 and 9 = 9/{A — 1) then h* converges to the solution of h* = 
9 tanh h*. We denote this solution by h^^. Hence, for large finite A we can say that h* = hl^+0{A~^). 

We now write new expressions for ci , C2 , a and f3 namely, 

(130) 

(131) 

(132) 

(133) 
(134) 

Expanding the function tanh(.) in a and /? in powers of A-^ we can write 

S{9) = ^ tanh /i* E+ (M/ cosh^ (6'M)) - ^E+(MVcosh2 (6'M)) (135) 
+ ^sech^/i* E+(MVcosh2(0M)) + 0(A-2). (136) 



Cl = 


iE+(M/ cosh2(0M)) + 




_(MVcosh2(6lM)) 


C2 = 


^E+(M/cosh2(0M)) - 


2A ^ 


-(MVcosh2(6lAf)) 


1 — a = 


i(l-tanh(/i* + ^/(A- 


-1))), 




1-/3 = 


^(l+tanh(/i* -^/(A- 


-I)))- 





^ft* = (A - l + o(l))6i 

*By Remark IC. II this tells us that there is a single point where limp_>cx) \\Q*scsQ*ss'^\\^ crosses 1. 
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Note that we have not expanded h* in powers of A~^. Defining ]E^|_(.) to be the expectation with 
respect to the tree model (I45p where all connections to node r have been removed (the field on each 
node is still h*) we can write, 

. , (M/cosh(i9M/(A - 1))) , , 

E+(M/cosh2(^M)) = to/ v^rLfn^ ^^^^ ' (^^T) 

E'[(cosh(6'M/(A - 1))) 

.9, 9. ^^ E^. (MV cosh (6iM/ (A - 1 )) ) , , 

E+ MVcosh2(0M) = —. (138) 

EO (cosh(^M/(A - 1))) ' 

In addition, making use of the symmetry of the regular tree and expanding cosh(0M/ (A — 1)) around 
eM'/{A - 1) and eM"/{A - 1) (Af and M" to be defined later) we can write 



]EU ^ A E^ — (139) 

+ \cosh(^M/(A - l))y + Vcosh(6iM/(A- l))y ' 

El { ^ ^ = tanh/i* eO ( ^ ^ (140) 

+ \cosh(^M/(A - 1))/ + Vcosh(^MV(A- l))y 

9 n /'tanh(^M7(A - 1))\ , , 



A - 1 + ycosh(^MV(A- 1)) 

Eli ^ = AE^f ^ ^ (142) 

^ \cosh(^M/(A -I)) J ^ Vcosh(^M/(A - 1))/ 

+ A(A-l)Eof y (143) 

+ Vcosh(0M/(A- 1))/ ^ 

\ ^ tanh^^E^f ^ ^ (144) 

+ \cosh(eM/(A - l))y + Vcosh(0M"/(A- 1))/ 

, n /tanh((9M'7(A - 1))\ . 9. . . 

tanh/i* E° K '-^ ^ + 0{A-^), (145) 



A - 1 ycosh(6lAf'7(A - 1)) 

where M' = M — Xi and M" = M — Xi — Xj. Using these relations, the law of large numbers and 
the relation h* = /ij^ + 0(A^^) where /ij^ = 0tanh/i^ it is now possible to calculate the limit 

hm S{e/{A - 1)) = -l + ^So tanh/^^ ^ ^^^^^ 
A^oo 2 cosh 

This finishes the proof of the second part of the lemma since /loo can now be determined by 
hoo tanh /iqo = 1 and 6 = /i^ . 

We now show how to deduce the above expression. Let us introduce the following notation, 

Eo = El( ^ ],Ei=El( ^ y (147) 

+ Vcosh(ef/(A -1))J + Vcosh(eMV(A -1))J 

E, = k( ^ ).F.=E°^"'"^^^'/'^-^»V (148) 

+ Vcosh(^M"/(A- l))y + \^cosh(eMV(A-l))y ^ ^ 
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With this in mind and recalhng that 9 = 6 /{/S. — \) we can write, 

S{e) = (150) 

lAE.tanhh* - -^—^Fij (151) 

^ ^ -sech^/i* I ( AEo + A(A - I) I E2 tanh^ h* - -r^i^2 tanh h* ] ] (152) 



2AD \A-1 J \ \ 

+ 0(A-i) (153) 

= ^^^^3^(Ai?i - (A - 1)E,) - ^ - A^-tanh/i* + ^^^^tanh/i* (154) 

2D ^ ^ ' ^' 2D A-12D D ^ ' 

+ —Osech^h* tanh^ h* + —^—hedi^h* - tanh ^sech^/i* + ©(A^M. (155) 

2D A-12D A-12D \ J \ J 

Now notice that expanding the cosh(.) inside Ei in expression AEi — (A — l)i?2 around M"9/{A — 1) 
we can rewrite the same expression as, 

E2- -^—^etanhh*F2 + 0{A-^). (156) 

Inserting this in the above expression finaUy gives us, 

S(e) = — tanh^/i* - ^—I^etanh^h* ^—iLetanhh* + — ^tanh/i* (157) 

^ ^ 2D A-12D 2D A-12D D ^ ^ 

+ ^feech^/i* tanh^ h* + 0(A"^). (158) 
By the law of large numbers we have, 

lim M/(A - 1) = lim M'/{A - 1) = lim M"/{A - 1) (159) 

A— >oo A— >oo A— >-oo 



lim E+{Xi 

A— s>oo 



= t&nhhl^, (160) 

e=6»/(A-i) 



and since all the variables inside the expectations are uniformly bounded, we can take the limit inside 
all the expectations of our expression for S{9). Doing so we get, 

lim S{e/iA - D) = ^^^^ - '-^^^ L (161) 

o^oc 2cosh2/i^ 2cosh2/i^ 2cosh2/i^ 2cosh2/i^ 

etanli^h* ^tanh^/i* 

+ 2 — - + (162) 

cosh /i^ 2 cosh h%o 

If we now use the relation hoo = tanh /iqo this expression can be simplified to, 

1 {-1 + hi, tanh hi,). (163) 



2 cosh^ /i^ 

Finally, we show that there exists a constant Cmin such that 



lim a^in{Q*ss) = o-min ( hm Q*ss ] > CminE (164) 



^The equality is guaranteed since the sequence of matrices {Qss}p^i have fix finite dimensions. 
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Figure 5: Example of small graph for which the incoherence fails. 



First notice that the eigenvalues of limp_s.oo Q*ss a — b and a + (A — 1)6 = ci — C2. It is immediate 
to see that a — b > 0. In addition, since A(ci - C2) = E+(MV cosh^ (6'M)) > it follows that 
ci — C2 > 0. Hence we can choose Cmin = min{a — b,ci — 02} > 0. 

□ 

D Analysis of Rlr(A) for other families of graphs 

As already discussed, the success of Rlr is closely related to the incoherence condition. For small 
graphs, brute force computations allow to explicitly evaluate this condition. For example, consider the 
reconstruction of the neighborhood of the leftmost node in the graph of Figure [5l The corresponding 
incoherence parameter takes the for, 

\\Q*SCsQ*SS ^Woo = '* ' (^^^) 

where x = tanhO. For x > = ^ {l - ^+2^/3) « 0.44249 (i.e. for 9 > atanh(x*) 0.475327) the 
right hand side is larger than 1, whence the incoherence condition is violated ||Q^csQ55'~^l|oo > 1- 
This simple calculation strongly suggests that Rlr(A) fails on the graph of Figure [5] for 9 > 
atanh(x*), although it does not provide a complete proof of this failure. In this appendix we study 
three classes of graphs of increasing size. We show that with high probability Rlr succeeds in 
reconstructing trees. On the other hand, we show that it fails -for 9 large enough- at reconstructing 
large two-dimensional grids, and that in fails in reconstructing graphs Gp from the toy example in 
Section 11.11 

D.l Trees 

Lemma D.l. If G is a tree rooted at r with depth > 1 and node r has degree A > 1 then, for this 
node 

\\Q*scsQ*ss'^ Woo = tanhe < 1, (166) 

(^mmiQ*ss) > (1 -tanh2 6')/cosh2(6'A) anc? o-max(IEG,e(^J^5)) = 1 + (A - 1) tanh^ 6^. 

Proof. In what follows E will denote Efj g. Consider a node r' G S'-' and let k G S be the unique 
node in S that belongs to the shortest path connecting r' to r. Let t be the distance between r' and 
k. For every i £ S one can write, 

Qrt = E{Xr>Xi/cosh^{9M)) = E{Xr> Xk)E{XkXi/ cosh^ {9M)) = (tanh^)* E{XkXi/ cosh^{9M)). 

This equation is still valid if A; = i. We can thus write that Q*/^ = (tanh^)* Q^,^ and hence 
Qr's(Q*ss)^^ = (tanh0)*efe where is a row vector with all entries equal to zero except k^^ entry 
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Figure 6: Labeling of the nodes in the grid. 



that equals 1. Therefore we can write ||Q*/5(Q5s)~"^ ||i = (tanh6')*. Since there must exist at least 
one node r' € S for which the corresponding node k is at distance 1 from S, that is for which t = 1, 
we conclude that \\Q*gc sQ*ss~^ \\°° ~ ^^^^6' < 1. 

To prove the spectral bounds first notice that the positive-semidefinite matrix Q55 has entries 
Q*. = (a - b)Sij + b where a = E(l/ cosh^ (6IM)) and b = E(XiX2/ cosh2(6lM)) and where 1 and 2 
are any two distinct nodes in S. A matrix of this form has eigenvalues a — b and a + (A — 1)6. It is 
not hard to see that 6 > and hence 

^mm{Q*ss) = a-b = E{{l- XiX2)/cosh^{9M)) > E(l - X1X2)/ cosh2(0A). (167) 

Since E(l — X1X2) = 1 — tanh^ 9 the lower bound follows. 

The computation of the value of the maximum eigenvalue value of KQ^0{X'g Xs) is trivial since 
this matrix is also of the form (a — b)5ij + b with a = 1 and b = tanh^ 9. 

□ 

D.2 Two-dimensional grids 

Lemma D.2. If G is a two dimensional grid with periodic boundary conditions (each node connects 
to its four closest neighbors) then for p large enough 9 > 9c we have \\Q*gcgQ*ss~^\\'X) > 1 + e md 
o'ram{Q*ss) > C'min whcre 9c, € > and Cmin o,i"e independent of p. 

Proof. We shall compute a lower bound on 11(5505^55' ""^lloo by means of a low temperature expan- 
sion, i.e. a Taylor expansion in powers of e^^. We will show that for this lower bound the lemma 
holds. 

Label the central node as node 0, the neighboring nodes as 1, 2, 3 and 4. Denote as node 5 be 
the common neighbor of node 1 and node 4. Throughout this proof we will denote Eg ^ by E and 
^G,e by P. 

First notice that due to the periodic boundary condition there is symmetry along the vertical 
and horizontal axis in the lattice. Knowing this, matrix can be written as 

a b 
b a 
c b 
_ b c 

where a = E{1 / cosh^ {9 M)) , b = E{XiX2/ cosh.'^{9M)) and c = EiXiX^/ cosh'^{9M)), where M = 
"^iQQiXi, that is, M is the sum of the variables in the neighborhood of i {i not included). Since we 



c b 

b c 

a b 

b a 



(168) 
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Figure 7: Basic type of configurations for the calculation of P(|M| = 0). The number in front of 
each picture represents the number of equivalent symmetric configurations that need to be taken 
into account. 



only want to prove a lower bound on \\Q*gcsQ*ss ^11°° ^^^y consider the row of Q*gcs associated 
with node 5. This row has the form, 

[ d e e d], (169) 

where d = ^{XiXk,/ cosh^(^M)) and e = 'E{X2X^/ cosh^(0M)). To compute the low temperature 
expansions of each of these quantities we first write, 

a = n\M\ = 0) + ^F(|M| = 2) + = 4), (170) 

^(^^Sm) " [n\M\=0,X,X, = l)-Fi\M\=0,XiX, = -l)] (171) 

+ ^^[IP(I^I=2,X,X, = 1)-P(|M|=2,X,X, = -1)] (172) 
cosh 2& 

+ -^T7z[n\M\=i,X,X, = l)-Fi\M\=4,X,X, = -l)]. (173) 
cosh 4:8 

The problem thus resumes to the computation of the above probabilities. We will exemplify the 
calculation of the low temperature expansion of P(|M| = 0), the calculation of the expansion for the 
other terms follows in a similar fashion. 

Let 7i{x) = Yj{ij)&E^i^j-' "^max = max^?^(^) = and ST-Lix) = 7i{x) - "Hmax = -27^(x) where 
V{x) is the length of the boundary separating positive spins from negative spins in configuration x. 
Then, 

P(|M| = 0) = |: ^^^^"^ 
{x:xo=l,M=0} 

s>4 {x:xo=l,M=0,V=s} 

The term 26^^^=""/^ appears in all a,b,c,d and e and thus is irrelevant for the computation of 
[Q*gc gQ*ss~^'\5' Since only configurations with zero magnetization contribute to the sum there are 
two basic types of configurations we need to consider, both of which must have exactly two neighbors 
of node with negative spin. These are represented in figure [71 Starting from these two basic states 
we need to consider the first few lowest energy configurations. To help the counting there are two 
parameters that we keep track of: the number of negative spins, t, and the perimeter of the boundary, 
s. The first type of state produces the counting expressed in tabled) The associated configurations 
are represented in figure [H 
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^ure 8: Configurations derived from first basic type of configuration for the calculation of P(|M| 



32 



Table 1: Low energy states from first basic configuration for low temperature expansion of P(|M| = 0) 



Negative spins, t Boundary perimeter, s Number of states 



2 8 4 x1 

3 8 4 x1 

3 10 4x4 

4 10 4x6 

5 10 4x2 



For the second type of basic configuration the counting is in table [2] and the associated configu- 
rations in figure [9l 

Table 2: Low energy states from second basic configuration for low temperature expansion of P(|M| = 
0) 



Negative spins, t 


Boundary perimeter, s 


Number of states 


2 


8 


4 X 1 


3 


10 


4x6 



We can thus write, 

P(|M| = 0) = ^e^^--(10e-i^^ + GOe^^oe ^ oie'^^^)). (176) 

For the expansion of P(|M| = 2) we also have two basic states types from which all the other ones 
are built. The first type has only one negative spin in the neighborhood of node and the second 
type has 3 negative spins in the neighborhood of node 0. See figure [TUl 

The counting of states derived from the first basic state type and second basic state type are 
recorded in tables [3] and [H respectively. 

Table 3: Low energy states from first basic configuration for calculation of P(|M| = 2) 
Negative spins, t Boundary perimeter, s Number of states 



1 4 4x1 

2 6 4 x3 

2 8 4 X (1^1 - 8) 

3 8 4 X 10 

4 8 4 x2 



We can thus write. 



|M| = 2) = le^^--(4e-s^ + 126-^26 ^ o(e-i69^)_ ^^77) 

Zj 
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Figure 9: Configurations derived from second basic type of configuration for the calculation of 
P(|M| = 0). 
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Figure 10: Basic type of configurations for the calculation of P(|M| = 2). The number in front of 
each picture represents the number of equivalent symmetric configurations that need to be taken 
into account. 
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Table 4: Low energy states from second basic configuration for calculation of P(|M| = 2) 



Negative spins, t Boundary perimeter, s Number of states 
3 12 4 X 1 



For the expansion of P(|M| = 4) we again have two basic states types from which all the other 
ones are built. The first type has all spins positive in the neighborhood of node and the second 
type has all spins negative in the neighborhood of node 0. The counting of states in printed in table 

13 



Table 5: Low energy states for calculation of P(|M| = 4) 



Negative spins, t Boundary perimeter, s Number of states 

i 

1 4 1^1-5 



4 16 1 



We thus have, 

P(|M| = 4) = le^^--(4e-s^ + Ue'^^^ + 0{e-^^^)). (178) 
Using the expansion 1/ cosh^(x) = 4e~^^(l — 2e~^^ + 3e~^^ + 0(e~^^)) we can finally write, 

For the probabilities involved in the calculation of b we get the following expansions, 



P(|M| 


= 0,XiX2 = 


1) 


Zj 


(180) 


P(|M| = 


0,XiX2 = - 


-1) 


Zj 


(181) 


P(|M| 


= 2,XiX2 = 


1) 


Zj 


(182) 


P(|M| = 


2^X1X2 = - 


-1) 


z 


(183) 


P(|M| 


= A,XiX2 = 


1) 


= ie^^--(l + (1^1 - 5)e-^^ + 0{e~^^^)), 
z 


(184) 


P(|M| = 


4,XiX2 = - 


-1) 


= 0, 


(185) 



and putting everything together we obtain, 

b = le^^--(4e-s^ + {A\E\ - 30)6"^^^ + 0(6-^°^)). (186) 
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For the probabilities involved in the calculation of c we get the following expansions, 

F{\M\ = 0,XiX3 
F{\M\=0,XiX3 = 

F{\M\ = 2,XiXs 
P(|M| = 2,^1X3 = 

P(|M| =4,^1X3 

F{\M\=4,XiX3 = 

and putting everything together we obtain, 

c = le^^--(4e-^^ + {A\E\ - 3A)e-^^^ + 0{e-^^^)). (193) 
For the probabilities involved in the calculation of d we get the following expansions, 

P(|M| =0,^1X5 
F{\M\=0,XiX5 = 
P(|M| = 2,^1X5 
P(|M| = 2,^1X5 = 
P(|M| =4,^1X5 
P(|M| =4,XiX5 = 
and putting everything together we obtain, 

d = le^^--(4e-^^ + 8e-^2e ^ ^^^^ _ ^Q-^^-ied + 0{e-'^^^)). (200) 
For the probabilities involved in the calculation of e we get the following expansions, 



P(|M| 


= 0,^2X5 


= 1) = 


-e^^--(4e-i6e + 22e-20'^ + 0(e-24'^)), 


(201) 


P(|M| = 


0,^2X5 = 


-1) = 


-e^«--(6e-i6^ + 38e-20^ + 0(e-2^^)), 


(202) 


P(|M| 


= 2, X2X^ 


= 1) = 




(203) 


P(|M| = 


2, X2X5 = 


-1) = 


Zj 


(204) 


P(|M| 


= 4,^2X5 


= 1) = 


le^^--(l + (|£;| - 6)6"^'^ + 0{e-^'^^)), 
Z 


(205) 


P(|M| = 


4,^2X5 = 


-1) = 


z 


(206) 
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= 1) = |e^«— (2e-^6^ + 12e-2"'' + 0(e-24'')), (187) 
z 

_1) = 2 ,^_^g^_i6,^^g^_20^^Q(g-24^))^ (igg) 

z 

= \) = |e^^— (2e-«^ + 6e-i2^ + 0(e-i6^)), (189) 
z 

_1) = 2 '^«--(2e-8^+6e-i2e + o(g-i6e)^^ (^gQ^ 
z 

= 1) = le^^--(l + (|E| -5)e-^^ + 0(e-^2^)), (191) 
z 

-1) = 0, (192) 



= 1) = 


2 e«_(gg-i6e + 38e-20^ + 0(e-24^)), 
z 


(194) 


-1) = 


-e^^— (4e-^6e + i9g-2oe ^ ©(e'^^*^)). 


(195) 


= 1) = 


2 e«.a.(3g-8e + 9^-12^ + 0(e-i6^)), 
z 


(196) 


-1) = 


lgew^ax(g-89 ^ 3g-i2(? ^ o(e-^6e)^^ 


(197) 


= 1) = 


^e^^--(l + (|£;| - 6)e-«^ + 0(6-^^^)), 
z 


(198) 


-1) = 


^e^^— (e-«^ + 0(e-i2^)), 
z 


(199) 



and putting everything together we obtain, 




-max 



(4e"^^ + 8e 



-120 



+ (4|E| - 46)e 



160 




(207) 




Following the ideas of |31j one can then show that the above formal expansion converges (a priori it 
could be case that one of the higher order terms would depend on l-E"!). This finishes the first part 
of the proof. 

We now prove that there exists Cmin > such that limp_>.oo cr{Q*ss) > C'min- This will prove the 
second part of the theorem. First notice that the eigenvalues of Q*gg are {a — c,a + 2b + c,a — 2b + c}. 
Now notice that, 



where for a + 26 + c and a — 26 + c we made use of the symmetry of the lattice. Since 1 — X1X2, M 
and Xi + X3 — X2 — X4 only depend on a fixed finite number of spins, and since 6 < 00, there is 
a positive probability, independent of p, of their being non-zero. Hence, all eigenvalues of Q*oq are 



D.3 Graphs Gp from the toy example 

In this section we show that Rlr(A) fails to reconstruct the graphs Gp defined in Section 11.11 (see 
Figure [TJ for all A when 6 is large enough. Note that this differs from previous analysis in the sense 
that we do not require that A — )• 0. We also show that this 'critical' 9 behaves like for large 
A. Our analysis is based on numerical evaluation of functions for which explicit analytic expressions 
can be given along the lines of Section |Al Hence, our argument should be understood as a sketch of 
a proof. 

The success of Rlr(A) is dictated by the behavior of L{6j. ; {x^^^}"^^) when n is large. In fact, it 
is easy to use concentration inequalities to show that the solution of Rlr for finite n converges with 
high probability to the minima of Loo(^) + '^ll^lli where Lao{0) = lim„_>.oo L{9^ ; {x^^^}"^^). 

If A —7- as n —7- 00, we have seen that the success of Rlr is dictated by the incoherence condition, 
which in turn is determined by the Hessian of Loo{&). It is not hard to see that for this family of 
graphs, ||Q^C5Q5s~^l|oo is increasing with p. For p = 5, Eq. (|165p tells us that the incoherence 
condition will be violated for 9 high enough. Hence, by Lemma l4.1i Rlr will fail for all Gp {p > 5) 
when A — )■ as n — 7- 00. The question now is: how does Rlr(A) behave if A — t- does not hold? 

If A > constant > 0, the success of Rlr is dictated by the minima of Lao{ff) + A||^||i. For this 
specific family of graphs, it is also not hard to see that for < < 00, Loo is strictly convex and 
that due to symmetry the unique minimum of Lao{9) + A||^||i must satisfy ^13 = ^14 = • • • = 9ip 
for any A. This allows us to consider Loo{9) as a function of only two parameters. We call it 




(211) 



(210) 



(209) 



strictly positive even as p — )• 00. 



□ 
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Figure 11: For this family of graphs of increasing maximum degree A Rlr(A) will fail for any A > 
if 9 > K/A, where X is a large enough constant. 



L'{9i3,9i2) = L{9i2,9i3,9i3, ...,^13). Now, the problem of understanding Rlr for A > 0, large n and 
any p becomes tractable and associated to understanding the following problem, 



We can analyze this optimization problem by solving it numerically. Figure [12] shows the solution 
path of this problem as a function of A for p = 5 and for different values of 9. 

From the plots we see that for high values of 9, Rlr will never yield a correct reconstruction (unless 
we assume A = 0) since for these 9s all curves are strictly above the horizontal axis, that is, 9i2 > 0. 
However, if 9 is bellow a certain value, call it 9t {9t ~ 0.61 for graph G5), then there are solution 
that yield a correct reconstruction if we choose values of A > 0. In fact, for 9 < 9t all curves exhibit 
a portion (above a certain A) that have 9i2 = and ^13 > 0. That is, for 9 < 9^, Rlr makes a correct 
structural reconstruction. If we make 9 even smaller then the curves identify themselves with the 
horizontal axis. We call by 9l the value of 9 below which this occurs. 

We again note that all previous considerations were made in the limit when n — )• 00. For high 
finite n, with high probability the solution curves will not be the ones plotted but rather be random 
fluctuations around these. For A = 0, finite n and 9 > 9l, the solution curves will no longer start 
from 9 = 6* = (9, 0) but will have a positive non vanishing probability of having 9i2 > 0. This 
reflects the fact that for finite n the success of Rlr(A) requires A to be positive. However, for 9 < 9l 
and A > such that we are in the region where the curves for n = 00 are identically zero, the curves 
for finite n will have an increasing probability of being identically zero too. Thus, for these values 
of A and 9, the probability of successful reconstruction will tend to 1 as n — t- 00. From the plots we 
also conclude that, unless the whole curve (for n = 00) is identified with zero, Rlr(A) restricted to 
the assumption A — t- will fail with positive non vanishing probability for finite n. For 6 < 9l, when 
the curves (for n = 00) become identically zero, there will be a scaling of A with n to zero that will 
allow for a probability of success converging to 1 as n — )• 00. 

Requiring A — t- makes 9l be the critical value above which reconstruction with Rlr fails. This is 
the scenario in which we studied Rlr in section [2.2.21 In fact, 9^ coincides with the value above which 
11^505^55"^ lloo > 1- For this family of graphs we thus conclude that the true condition required 
for successful reconstruction is not \\Q*gcsQ*ss~^ Woo < 1 but rather that 9 < 9t- Surprisingly, 
for graphs in Gp this condition coincides with Kc^iXiX^) > KG,e{XiX2), i.e. the correlation 
between neighboring nodes must be bigger than that between non-neighboring nodes. Notice that 
this condition is in fact the condition required for Thr to work. Consequently, for this family of 
graphs, the thresholding algorithm will always have a working range in terms of 9 larger than that 
of Rlr, when restricted to A — )• 00. In fact, a simple calculation using the local weak convergence 
used in proving Lemma 14.31 shows that with high probability, for large random regular graphs, the 
correlation between neighboring nodes is always strictly greater than between non-neighboring nodes. 



6'i3,ei2 



min L'(0i3, 9i2) + Xip - 2)\9i3\ + A|^i2 



(212) 
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Figure 12: Solution curves of Rlr(A) as a function of A for different values of 9 and p = 5. Along 
each curve, A increases from right to left. Plot points separated by 5\ = 0.05 are included to show 
the speed of the parameterization with A. For A — )• oo all curves tend to the point (0,0). Remark: 
Curves like the one for = 0.55 are identically zero above a certain value of A. 



This shows that the thresholding algorithm has as operation range 6 G (0, oo) for random regular 
graphs, compared to 9 £ (0,6*/,) for Rlr. 

We will now prove that for large enough A = p — 2 there is a unique 9t{A) (solution of 
IEg,6»,a(-'^i-'^3) = 1Eg,6i,a(^i-'^2)) that scales like 1/A and above which Eg,0,a(-'^i-'^3) < IEG,e,A(^i^2)- 
Let 1 and 2 be the two nodes with degree greater than 2 and let 3 be any other node (of degree 2), 
see Figure [lU Define xa = ^G,e,Ai-^'^-^'2) ~ ^g,6»,a(-^i-^3)- It is not hard to see that, 

XA + tanh^ 9 tanh 9 xa + tanh 9 

XA+l = 2 VA+l = 2 • (213) 

1 + tanh 9 xa 1 + tanh 6 xa 

From these expression we see that the condition xa(6') > yA{9) is equivalent to xa-i(^) > tanh 6*. 
Remembering that expectations on the Ising model ([T|) can be computed from subgraphs of G, [35], 
an easy calculation shows that, 

(1 + z(0))A + (1-z(0))A' 

where z{0) = tanh^(0). Since xa 1 with A then any 9t also goes to with A and attending to 
the slope and concavity of xa{9) and tanh(0) for small 9 it is easy to see that for large A there will 
exist a unique solution ^^(A). Furthermore, the condition xa+i{9) = VA+iiP) can now be written 
like, 

^"^^^ - (l + z(0))A + (l-.(0))A- (215) 
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Assuming z = KA , multiplying both sides of the previous equation by /S?/"^ and taking the limit 
when A — )• oo we obtain, 

^= lim A^/2^anh(KA^-^), (216) 

A— >oo 

which will result in a non trivial relation for K only if 7 = 2. In this case we get X^/^ = K and thus 
for any e > 0, if A is sufficiently high, there will be a (unique) solution of (I215p inside the interval 
[(1 - e)/A2, (1 + e)/A2]. Since z{e) = timh^{e) then 0t(A) scales likes 1/A as we wanted to prove. 
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